Method for recording, parsing, and transcribing deposition proceedings

ABSTRACT

Techniques for accurately recording sworn deposition testimony without use of a court reporter are described herein. According to these techniques, participants in a deposition or other legal proceeding are identified in such a manner that speech in one or more audio files representing the deposition can be associated with the respective participants. The association of participants with recorded speech is used to automatically generate an accurate transcript sequentially reflecting what was said at the deposition proceeding and by which of the respective participants.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. appl. Ser. No. 15/963,683,titled “SYSTEM AND METHOD FOR AUTOMATED LEGAL PROCEEDING ASSISTANT”,filed on Apr. 26, 2018, which claims the benefit of U.S. ProvisionalApplication No. 62/491,705, titled “SYSTEM AND DETECTING AND PARSINGCONTEMPORANIOUS SPEECH EVENTS FROM A PLURALITY OF AUDIO INPUTS”, filedon Apr. 28, 2017 in the United States of America, both of which areincorporated herein by reference. A claim of priority is made.

FIELD OF THE INVENTION

This disclosure is directed to audio recording and processingtechniques, and more specifically to techniques for converting speech totext.

BACKGROUND

In a typical legal proceeding such as a trial or deposition, a courtreporter is employed who administers oaths, listens to individualspeakers who are a party to the legal proceeding (both attorneys andwitnesses) and captures stenographically what is said and by whom. Usinga court reporter to capture spoken language in a legal proceeding maysuffer from drawbacks. For example, a court reporter may be expensive toemploy and sometimes inaccurate. In addition, a court reporter may notefficiently complete transcripts of a legal proceeding, leading todelays.

SUMMARY

This disclosure is directed to systems, methods, and techniquesproviding to an automated legal proceeding assistant. In one example, amethod is described herein. The method includes recording, using eachmicrophone of a plurality of microphones, the content of a deposition.The content of the deposition comprises a plurality of speech segmentsrecorded by the plurality of microphones, wherein each of the pluralityof microphones is associated with a deposition participant of aplurality of deposition participants. The method further includesidentifying, based on which microphone of the plurality of microphoneseach speech segment was recorded by, which deposition participant of theplurality of deposition participants is associated with each speechsegment. The method further includes generating, based on whichdeposition participant of the plurality of deposition participants isidentified as associated with each speech segment, a document comprisinga transcript of the deposition. The transcript comprises a sequentialidentification of what content was spoken in each speech segment inwritten text, and which deposition participant of the plurality ofdeposition participants spoke the content in each speech segment.

As another example a system is described herein. The system includes atleast one microphone. The system further includes a user interfacedevice accessible to at least one of a plurality of depositionparticipants. The system further includes an audio translation engine.The audio translation engine includes an audio storage module configuredto store at least one representation of audio recorded by the at leastone microphone during a deposition proceeding. The audio translationengine further includes a speaker identification module configured toidentify, in the audio recording, which of the plurality of depositionparticipants spoke one or more portions of the recorded audio. The audiotranslation engine further includes a speech-to-text module configuredto convert speech of in the recorded audio into a textual representationof the speech. The audio translation engine further includes atranscript generator module configured to generate a documentrepresenting a transcript of the deposition based on the convertedspeech and the identified which of the plurality of depositionparticipants spoke the one or more portions.

According to another example, a system is described herein. The systemincludes at least one microphone. The system further includes a userinterface device accessible to at least one of a plurality of depositionparticipants. The system further includes an audio translation engine.The audio translation engine includes audio storage means that store atleast one representation of audio recorded by the at least onemicrophone during a deposition proceeding. The audio translation enginefurther includes a speaker identification means that identify, in theaudio recording, which of the plurality of deposition participants spokeone or more portions of the recorded audio. The audio translation enginefurther includes speech to text means that convert speech of in therecorded audio into a textual representation of the speech. The audiotranslation engine further includes transcript generation means thatgenerate a document representing a transcript of the deposition based onthe converted speech and the identified which of the plurality ofdeposition participants spoke the one or more portions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a conceptual diagram depicting one example of an automatedlegal proceeding assistant consistent with one or more aspects of thisdisclosure.

FIG. 2 is a block diagram depicting components of an automated legalproceeding assistant system consistent with one or more aspects of thisdisclosure.

FIGS. 3A-3C are conceptual diagrams depicting the recording of speechfrom deposition participants to generate a transcript consistent withone or more aspects of this disclosure.

FIG. 4 is a conceptual diagram depicting one example of recording ofspeech from deposition participants to generate a transcript consistentwith one or more aspects of this disclosure.

FIG. 5 is a conceptual diagram depicting one example of audio processingto generate a transcript consistent with one or more aspects of thisdisclosure.

FIG. 6 is a conceptual diagram depicting one example of data that may bestored by a server consistent with one or more aspects of thisdisclosure.

FIG. 7 is a flow diagram depicting one example of a method ofautomatically generating a legal proceeding transcript consistent withone or more aspects of this disclosure.

FIG. 8 is a block diagram illustrating a computing environment in whichrespective components of an automated legal proceeding assistant systemmay operate consistent with one or more aspects of this disclosure.

DETAILED DESCRIPTION

FIG. 1 is a conceptual diagram illustrating one example of an AutomatedLegal Proceeding Assistant (ALPA) system 100 according to one or moreaspects of this disclosure. ALPA system 100 is an automated system thatprovides assistance that simplifies a legal proceeding, such as a trialor deposition, for participants in the legal proceeding. For example,ALPA 100 may enable the participants, for example deponents, attorneys,judges, and the like, to swear-in, automatically record testimony,generate transcripts, and provide a smooth and seamless process toenable resolution of ambiguities in generated transcripts to create afinal, official transcript of the legal proceeding sufficient to serveas evidence, if necessary. In some examples, ALPA system 100 mayadvantageously perform some functions typically performed by a humancourt reporter.

System 100 described herein improves efficiency by eliminating thetime-lag on receiving deposition transcripts. In some examples, ALPAsystem 100 creates a revenue stream for attorneys/law firms/companieswho perform depositions as they can now charge for the product (over andabove any billable time) while eliminating paying a court reporter forher/his time performing transcription and for the deposition transcriptitself, for any expedited transcript production, for the editing of atranscript for accuracy, and for the treatment of documents referencedduring the deposition, such as exhibits. Using a court reporter will bemore expensive for a client than using ALPA system 100. Thus, ALPAsystem 100 may provide attorneys or law firms a selling point for theirclients (or can save money if they in-house their depositions).

The examples described are directed to a deposition legal proceeding,however one of skill in the art will recognize that the techniquesdescribed herein may be applicable to any type of legal proceeding thatrequires generation of reliable transcripts reflecting the content ofwhat was said, and by whom, during the legal proceeding.

As shown in FIG. 1 ALPA system 100 includes an audio translation engine107, at least one microphone 105, and at least one user interface 109A,109B. ALPA system 100 utilizes one more of microphones 105 to detect,capture, transmit and record sounds, including voices. The microphones105 can be any one of numerous such devices known in the art, such asstandalone microphones (whether “wired” or wireless) or devices thatincorporate microphones or other audio technology, such as computers(laptops, smart phones, iPads) and the like.

As shown in FIG. 1, microphone(s) 105 are arranged to capture recordableaudio of participants in a deposition proceeding. As shown, microphone105 is arranged to capture audio reflecting statements made orally bydeposer 103A, as well as deponent 103B.

As also shown in FIG. 1, system 100 includes an audio translation engine107. Audio translation engine 107 receives (directly or indirectly) frommicrophone 105 digital or other data reflecting audio recordings of oralstatements and other audible sounds made by deposer 103A and deponent103B in the course of a deposition proceeding. Audio translation engine107 stores, for example in temporary memory such as Random Access Memory(RAM), or long term storage such as a magnetic hard disk or otherlong-term storage device (or, in other embodiments, otherwise accesseselectronically) the received data reflecting audio recordings, andprocesses the data to generate a transcript 113 reflecting the orallycommunicated content of the deposition proceeding. Audio translationengine 107 generates the transcript 113 to include all (or substantiallyall) statements made by participants 103A, 103B on the record during thecourse of the deposition, with each statement identified based on whosaid the statement in a sequential or substantially sequential manner.

In addition, ALPA system 100 includes user interfaces 109A, 109B. Userinterfaces 109A-109B enable users, such as participants of the legalproceeding, and/or non-participants running the legal proceeding(administrator, paralegal, etc.), to interact with system 100 during adeposition. For example, user interfaces 109A, 109B may each comprise acomputing device (laptop, smartphone, tablet computer) with a displayand some form of input means (keyboard, mouse, touch-screen) for a userto receive information from system 100 and/or to provide input to system100.

As shown in FIG. 1, audio translation engine 107 is coupled to a network111, such as the internet. Network 111 enables communication betweenaudio translation engine 107 and user interfaces 109, as well as toother components of system 100 not depicted in FIG. 1. For example,although not depicted in FIG. 1, system 100 may include one or moreremote computing devices such as server computers accessible via network111 that store data and or execute instructions associated with audiotranslation engine 107, user interfaces 109, or both.

FIG. 2 is a block diagram depicting one example of an Automated LegalProceeding Assistant (ALPA) 200 according to one or more aspects of thisdisclosure. As shown in FIG. 2, ALPA 200 includes an audio translationengine 207, at least one microphone 105, and at least one user interface109. Microphone 105 includes any device or devices configured to capturean audio recording. User interface 109 include any device that enableusers, such as participants in a legal proceeding, to interact with ALPAsystem 200, for example to provide input or receive feedback from ALPAsystem 200.

As shown in FIG. 2, audio translation engine 207 includes an audiostorage module 230, a speaker identification module 232, a speech totext module 234, and a transcript generator module 240. As describedherein, each of modules 230, 232, 234, 240 include software instructionsstored in a tangible storage medium and executable by a processor of acomputing device. In some examples, each of modules 230, 232, 234, 240are executable on a computing device local to where a legal proceedingsuch as a deposition takes place. For example, one or more of modules230, 232, 234, 240 may execute on a device that serves as user interface109, which may be a smartphone, tablet, laptop computer, desktopcomputer, or the like. In other examples, one or more of modules 230,232, 234, 240 include software instructions executable on a processor ofone or more computing devices located remotely, such as one or moreserver computing devices coupled to audio translation engine 207 over anetwork such as the internet.

In operation, ALPA system 200 allows a user to initiate the depositionproceeding. As an example, ALPA system 200 provides a user with a visualindication, such as through a display of user interface 109, with anoption to commence the deposition proceeding. In advance of, orcontemporaneously to the start of a deposition, the ALPA system 200requests or permits the identification of deposition participants.Deposition participants may include one or more deponents, or one ormore deposing attorneys, one or more representing attorneys whorepresent the deponent in the deposition, or one or more otherparticipants, such as witnesses or, in the course of courtroomproceedings, judges or magistrates or other court personnel. ALPA system200 may also request or permit the input of other information associatedwith the deposition, such as a court case number, attorney docketnumber, filing date, other information that identifies the subjectmatter of the deposition proceeding. ALPA system 200 may also request orpermit the input, though a user interface 109, any other informationthat is typically reflected or reflected in a deposition transcript,including information associated with the confidentiality level orpresumed confidentiality level of the subject matter of the proceeding,information regarding individuals present but not speaking at thedeposition, the location of the deposition, or the law firms andcompanies represented by individuals present, in person ortelephonically, at the deposition (whether speaking or assigned amicrophone or not).

In some examples, ALPA system 200 will execute an initializationprocedure to prepare for recording and generating a transcript of thedeposition proceeding. As part of the initialization procedure, ALPAsystem 200 determines a list of participants in such a manner thatsystem 200 may differentiate between different speakers during thedeposition proceeding, so that an accurate transcript can be generated.For this purpose, transcript generation engine 207 includes a speakeridentification module 232, which identifies respective participants ofthe deposition.

In some examples, ALPA system 200 includes a plurality of microphones105, each of which are assigned to a particular deposition participant.According to these examples, speaker identification module 232 uses themicrophone assignments themselves to associate recorded audio with aparticular speaker. For example, each participant may wear, or keep inclose proximity, a microphone 105. As examples, the participants maywear a microphone (e.g., secured to a user's shirt collar, earpiece,etc.), or may use a computing device including a microphone, such as asmartphone or tablet, or a standalone microphone device arranged inproximity to the participant.

According to these examples, system 200 may prompt participants, viauser interface(s) 109, to speak a word or phrase, such as their name.Speaker identification module 232 may then determine whether it canaccurately identify the spoken voice of each participant speaker. Insome examples, if speaker identification module 232 is unable toaccurately separate one speaker from another, speaker identificationmodule 232 may request, via user interface(s) 109, that one or moreparticipants change their microphone configuration. For example, speakeridentification module 232 may request that one or more participants movefurther away from other participants, or that one or more participantsuse a different microphone.

According to some other examples, ALPA system 200 may not only useassigned microphones 105 to identify different speaker participants fromone another. According to these examples, ALPA system 200 may instead,or in addition to identifying speakers based on a microphone thatrecorded audio, process (e.g., using audio captured from one microphoneonly (capturing audio from multiple deposition participants), or inanother embodiment several microphones 105) the captured audio toidentify respective speakers in audio recordings. According to theseexamples, speaker identification module 232 identifies speakerparticipants based on a number factors alone or in combination,including voice pitch height, pitch modulation, pitch range, speechrate, fluency, vocabulary, grammar, usage and other speech patterns orother data. Additionally, speaker identification module 232 may identifya user by other vocal traits, including measurements of the speakers useof vowels, including (for example) average and standard deviation forfundamental frequency; period to period frequency; period to periodamplitude variation; and GNE (glottal to noise excitation ratio), asexamples.

According to these examples, speaker identification module 232 isconfigured to store one or more speaker profiles in memory or accessexisting profiles of known speakers from prior depositions (as anexample). According to these examples, during an initializationprocedure of ALPA 200, speaker identification module 232 requests, usinguser interface(s) 109, that each participant to the deposition identifythemselves, for example through spoken word, or text input via userinterface(s) 109, or via other means. Speaker identification module 232then determines whether it has access to a stored profile for eachdeposition participant sufficient to identify them based on recordedspeech. If speaker identification module 232 does not include a storedprofile for a deposition participant, it may request that the missingparticipant supply information allowing speaker identification module232 to create a profile. For example, speaker identification module 232may, via user interface(s) 109, request that the missing participantspeak several predefined words or phrases from which speakeridentification module 232 can extract one or more speech parameters orproperties to generate a profile for that user.

In some examples, speaker identification module 232 may be generallyconfigured to utilize identification of a microphone or microphones thatcaptured audio to identify which deposition participant is associatedwith recorded audio segments, but may utilize processing to identifyspeaker(s) based on stored user profiles as a fail-safe. For example,system 200 may include a plurality of microphones each assigned to adeposition participant, and one or more “fail-safe” microphones notassigned to a particular deposition participant but arranged to captureaudio during a proceeding. According to such examples, if for somereason speaker identification module 232 is unable to identify a speakerassociated with an audio segment, speaker identification module 232 mayprocess audio recorded by the fail-safe microphone(s) to identifyspeakers associated with the recorded audio.

In some examples, whether speaker identification module 232 isconfigured to identify respective speaker participants of the depositionproceeding based on microphone 105 assignments, or based on processingcaptured audio to determine an identity of respective speakerparticipants based on comparison to a predefined profile, or both, aspart of the initialization procedure speaker identification module 232determines whether each deposition participant is a valid depositionparticipant whose speech may be identified in audio recordings. In someembodiments, the speaker identification module may identify, during thecourse of a deposition, the speech of someone not pre-identified asbeing a participant in the deposition, but may nevertheless, and inconjunction with system 200, record and translate their speech events.

In some embodiments, information solicited by the initializationprocedure of ALPA 200 will be input prior to the deposition though userinterface 109, and as a result, the deposition participants will notneed to enter information or establish a user profile for use by speakeridentification module 232 as part of the deposition proceeding itself.For example, in advance of the deposition, a legal assistant or otheruser may pre-enter information, including the names of the participants,the firms or companies they represent, link the participants with themany pre-exisiting voice profiles if one or more deposition participantshave previously used system 200, input the location of the deposition,the case name and caption, the deponent name, etc. In some cases, suchinformation will be entered well in advance of the deposition proceedingitself. In this manner, deposition participants, and other users, mayproceed immediately with the deposition proceeding itself, which maybeneficially save time.

In some examples, as part of the initialization procedure, system 200requests required participants of the meeting to administer an oath.Accordingly, system 200 outputs audio instructions or presents on adisplay (of user interface 109) a textual description of the oath, andrequest signatures or the traditional vocal assent to proceed under oathfrom the required participants. In some examples, signatures may bereceived via the user(s) writing their signatures on a touch-screendisplay of user interface 109.

Once speaker identification module 232 has completed the initializationprocedure so that it is prepared to identify the source of spoken wordfor each identified participant in an audio recording, the depositionproceeding may commence. Accordingly, ALPA 200 may, via userinterface(s) 109, request confirmation from one or more participantsthat the deposition should commence.

Once ALPA 200 receives an indication that the deposition shouldcommence, the parties may commence the deposition, for example, thedeposing attorney may ask questions to the deponent, the deponent mayanswer, and the deponent's attorney may interject with objections or thelike.

As the deposition proceeds, audio storage module 230 receives an outputsignal from microphone(s) 105, and stores one or more audio recordingsrepresenting what was said at the deposition in memory. For example,audio storage module 230 may compress received audio recordings toreduce size, encrypt received audio recordings to ensure security, orotherwise process audio recordings. In some examples, audio storagemodule 230 stores a single audio recording that represents an entiredeposition. In other examples, audio storage module 230 stores aplurality of audio files that represent captured audio from multiplemicrophones 105. In some examples, audio storage module stores audiorecordings with a plurality of timestamps that identify when aparticular recording was made.

In some examples, as audio storage module 230 operates to store recordedaudio, speaker identification module 232 analyzes recorded audio (e.g.,based on which microphone 105 recorded the audio, or based on matchingwith stored user profiles as described above), so that each audiorecording is stored by audio storage module 230 with a correspondingidentification of the source of the recording. In some examples, audiostorage module 230 stores audio recordings on a memory storage device(e.g., Random-Access-Memory, hard disk storage, flash memory storage) ona computing device local to the deposition proceeding, such as userinterface(s) 109. In other examples, audio storage module 230 storesaudio recordings on a computer server located elsewhere and connectedvia a network such as the internet.

In some examples, audio storage module 230 is operable to establishconfidentiality for stored audio recordings. According to theseexamples, audio storage module 230 may store recorded audio with one ormore confidentiality markers that system 200 may use to ensure that onlythose parties (e.g., respective deposition participants) may accessinformation, such as audio recording(s), that the deposition participantis authorized to access.

In some examples, system 200 may be configured to control access byassigning confidentiality markers to other data used by system 200, forexample identification of deposition participants or other parties to acourt proceeding, exhibits, user voice profiles, or any other data usedby system 200. In this manner, system 200 may enable respective partiesto easily access data or information they are allowed to access, howevermaintain confidentiality that would normally be maintained in atraditional court or deposition proceeding.

As also depicted in FIG. 2, ALPA 200 further includes a speech-to-text(STT) module 234. STT module 234 analyzes audio recordings stored byaudio storage module 230 to convert the content of spoken word towritten text that may be used to generate a transcript of the depositionproceeding. STT module 234 may include one or more executable softwaremodules that are configured to analyze an audio recording to identifyfeatures in the recording that enable STT module 234 to output one ormore text files that represent what was said in the audio recording(s).

Speaker identification module 232 further operates to identify in audiorecordings stored by audio storage module 230, a speaker source for eachword or phrase. As described above with respect to the initializationphase, in some examples speaker identification module 232 identifiesspeakers based on which of a plurality of microphones recordedparticular audio (or recorded the audio the loudest). In other examples,speaker identification module 232 uses one or more stored profilesrepresenting deposition participants in order identify a speaker inrecorded audio. In other examples, speaker identification module 232identifies speakers in recorded audio based on both an assignedmicrophone and one or more stored profiles.

As also shown in FIG. 2, ALPA 200 further includes an exhibit module236. Exhibit module 236 is configured to manage exhibits as part of thedeposition proceeding, such that the exhibits are easily accessible byparticipants in the deposition, and such that their use may be reflectedin a generated transcript. For example, prior to or during a depositionproceeding, a participant or other user (e.g., legal assistant orparalegal), may submit to system 200 via user interface 109 one or moredocuments that are identified as exhibits associated with a depositionproceeding or case. During a deposition proceeding, exhibit module 236may make one or more submitted exhibition documents available to thedeposition participants, for example via a display of user interface(s)109. Exhibition module 236 may capture data associated with use of theexhibit, for example exhibition module 236 may capture a timestampassociated with presentation of each exhibit document, and/or mayassociate the presentation of the exhibit with audio files, or portionsof audio files, that were captured while the exhibit was being presentedto the deposition participants. In this manner, data associatedpresentation of exhibit documents may be used to generate a transcriptthat reflects the discussion of the exhibit documents.

As also shown in FIG. 2, ALPA 200 further includes a transcriptgeneration module 240. Transcript generation module 240 is operable toreceive the output of STT module 234, as well as the output of speakeridentification module 232 and exhibit module 236, to generate atranscript that accurately reflects the deposition proceeding includingwhat was said during the deposition proceeding, who said it, and whatexhibits were discussed during the deposition. For example, transcriptgeneration module 240 receives text from speech to text module 232reflecting what was said in one or more recordings stored by audiostorage module 230, an indication of which deposition participant spokethe words associated with the received text from speaker identificationmodule 232, and/or an identification of one or more exhibit documentsthat were presented and discussed during the deposition, and when theywere presented and discussed. Transcript generator 240 may reviewtimestamps or other information contained in stored audio, and piecetogether a transcript reflecting sequentially the content of what wassaid, and by whom, during the deposition proceeding. Transcriptgenerator 240 may also use additional information in generating atranscript, for example, when the parties went on and off the record(e.g., reflecting breaks in a deposition proceeding such as a lunchbreak or overnight break when a deposition proceeding spans multipledays), the text of an oath administered to deposition participants,information that is reflected in a cover page of the transcript, such asidentification of a court case number, attorney docket numbers,participant names, law firms involved, an administrator's name, etc.

In some examples, transcript generator 240 may generate portions of atranscript in real-time during a deposition proceeding. According tothese examples, as audio storage module 230 receive and stores audiodata from microphone(s) 105, STT module 234 converts the stored audiodata into a text representation, and speaker identification module 232associates a deposition participant to each converted textrepresentation. At the same time transcript generator 240 sequentiallygenerates transcript portions as the deposition proceeding takes place.In some examples, by sequentially generating transcript portions in realtime, transcript generator 240 can quickly generate a transcript of thedeposition that is available to the deposition participants immediatelyupon conclusion of the deposition proceeding. In some examples, theinitial transcript generated upon conclusion of the deposition may be a“rough” version of the transcript that includes some errors. System 200may be configured to enable deposition participants to resolve sucherrors, as described in further detail below.

In some examples, transcript generator 240 is operable to, while adeposition proceeding is taking place, output via user interface(s) 109,generated transcript portions for real-time review by participants.According to these examples, transcript generator 240 may receive from auser confirmation and/or updates to generated transcript portions duringthe course of the deposition. In some such examples, providing forreal-time review of transcript portions during the course of adeposition may enable transcript generator 240 to generate a finaltranscript accepted by all deposition participants faster than if reviewof a generated transcript and resolution of ambiguities in a generatedtranscript take place after a deposition proceeding has concluded.

In some examples, system 200 may be configured to notify depositionparticipants when the deposition proceeding is “in-session” andtestimony is being recorded. For example, system may use userinterface(s) 109 to notify deposition participants when a deposition hascommenced, when paused, and when complete via a display screen of theuser interface(s). In other examples, system 200 may include a lightsuch as a light emitting diode (LED) device coupleable to system 200 viauser interface(s) 109. As one specific example, such a light device maycomprise a red light and a green light. System 200 may operate the greenlight when the deposition is in progress and audio is recorded bymicrophone(s) 105, and operate the red light when the deposition ispaused, has completed, or is otherwise not in-session.

Upon completion of the deposition (e.g., as indicated by a depositionparticipant), transcript generation module 240 generates a document thatincludes a transcript that generally reflects what was stated during thedeposition by the deposition participants. Once the transcript has beengenerated, it may be sent to each participant to the deposition, such asthe deponent and respective attorneys, via user interface(s) 109 (e.g.,a smartphone or tablet) for review for accuracy and ultimately finalapproval.

In some examples, ALPA system 200 is configured to resolve anyambiguities in the generated deposition transcript. For example, ALPAsystem 200 may identify any portions of the deposition transcript forwhich STT module 234 was unable to accurately determine the content ofwhat was spoken, or for which speaker identification module 232 wasunable to accurately identify a speaker. According to these examples,ALPA system 200 may send one or more deposition participants adeposition transcript proactively identifying each ambiguity, andrequest confirmation that the ambiguity-labeled content is accurate, orthat the respective participant(s) supply a correction. In someexamples, system 200 may send the deposition transcript with a timelimit in which the participant(s) are required to respond. For example,system 200 may request (via email, via 109, or other) that theparticipant type or speak what that participant believes was actuallysaid during the deposition, after which those corrections themselves maybe reviewed by one or more individuals for accuracy themselves, andpotentially contested, if there is a disagreement among the parties. Insome examples, system 200 may be configured to analyze an identifiedambiguity and provide one or more suggestions to resolve the ambiguity,which may be selected by the participants.

In some examples, audio storage module 230 maintains data reflecting atleast a portion of audio captured during a deposition proceeding in amanner that the recorded audio is associated with generated depositiontext. In this manner, the respective deposition participants can usesuch an audio recording to reconcile any ambiguities in a transcript ortranscript portion generated by transcript generator 240.

In some examples, if all deposition participants provide the same answerin response to identified ambiguitie(s) (or no ambiguities weredetected), transcript generator 240 generates a final transcript thatreflects the corrected ambiguity and sends the final transcript to allparticipants, notifies the participants that it is finalized, or makesit available via 109. In other examples, where the depositionparticipants do not agree on an identified ambiguity, transcriptgenerator module 240 generates a transcript that identifies theambiguity as “in-dispute,” and sends the generated transcript to allparticipants or otherwise makes it available, as stated above.

ALPA system 200 described above provides numerous advantages incomparison to prior techniques for recording deposition transcripts thatrequire a trained and licensed court reporter. For example, using ALPAsystem 200 may enable parties to a deposition or other legal proceedingto generate a transcript with less cost, because it is not necessary tohire an expensive court reporter to perform the task of generating atranscript. In addition, ALPA system 200 may work faster, and moreefficiently, than a human court reporter. For example, ALPA system 200may identify speakers and convert speech to text in real-time, therebyallowing a transcript to be generated immediately after the legalproceeding commences, in comparison to a court reporter who may takedays or weeks to review manually typed text and generate a finaltranscript. In addition, ALPA system 200 may provide for better accuracythan a human court reporter, and enables fast and reliable correction(or at least identification) of ambiguities in generated transcriptsubject matter in a reliable manner which avoids disputes betweendeposition participants.

FIGS. 3A to 3C are conceptual diagrams that depict a plurality ofdeposition participants, in this instance a policeman 103B, and twoattorneys 103A, 103C, their speech events being detected by a microphoneincorporated into one of a computer or smart phone, in one embodiment,or in an alternative embodiment, by wired or wireless listening devices(microphones, not depicted here) which are themselves in communicationwith a smart phone or computer in accordance with some embodiments ofthe invention.

As shown in FIG. 3A, the speech of each of deponents 103A-103C iscaptured by a microphone 105 associated with a user interface 109 (e.g.,a computing device such as a laptop, smartphone, tablet computer).According to such an embodiment, speaker identification module 232identifies based on speech characteristics an identity of respectivespeakers in the recorded audio.

FIG. 3B depicts an alternative embodiment, where each depositionparticipant is associated with specific microphone 105A-105C. Accordingto this example, each of microphones 105A-105C is coupled to a computingdevice (e.g., user interface 109), which are in turn coupled to anetwork 115 such as the internet. According to the example of FIG. 3B,where each deposition participant 103A-103C is associated with aparticular microphone 105A-105C, speaker identification module 232 mayidentify a speaker in recorded audio based on which microphone recordeda particular audio segment. Alternatively, the speaker identificationmodule 232 may identify a speaker based on one of the other voicerecognition means discussed above.

FIG. 3C depicts one example where system 200 captures speech ofdeposition participants via a microphone 105 of a user interface device109 (smartphone). As shown in FIG. 3C system 200, for each participant103A-103C, system accesses one or more stored profiles 122 to associaterecorded audio with a particular participant 103A-103C. If system 200does not already have access to a stored profile, system 200 may createa profile for each new speaker 120, for example by requesting that thenew user(s) read or repeat one or more phrases and analyzing the spokenphrases to create a user profile 122. In some embodiments a new user maynot read or repeat a phrase, but a user profile will be generateddynamically during the course of the deposition. In some examples, userprofiles may be stored locally (e.g., on user interface device 109), orremotely via a server computer coupled to system 200 via a network suchas the internet.

The audio translation engine 207 may be remote, and audio data may bestored locally or remotely, including in a cloud based environment. Theaudio data may be stored in a location proximate to or remote from theaudio translation engine, and the transcripts derived therefrom may alsobe stored locally or remotely from the audio translation engine and/orthe audio-enabled devices. In one embodiment, the deposition data,including voice data, may be stored directly on an iPhone or other smartphone or computing device, which may or may not be configured as anaudio translation engine 207 and/or a differentiation and associationengine, and/or a server, in one embodiment.

In another embodiment, where the smart phone or computing devise is notso configured, one or more of these functions may be remotely performedon speech data recorded and/or transmitted during a deposition, orrecorded during and transmitted after a deposition.

In one embodiment, audio translation engine 207 (e.g., speech to textmodule 234, and in some embodiments in conjunction with 234) uses voicerecognition technology to identify words and create a transcript basedon recorded audio file(s). Audio translation engine 207 detects thevoice profile of a specific speaker that is either stored locally orwhich can be accessed from a remote database utilizing network means,and identifies the speech acts of that specific individual as distinctfrom any other speakers. In another embodiment, where the system 200 isnot equipped to identify a specific speaker by a stored or otherwiseknown audio profile, the identity of that speaker can be identified tothe system 200 by generating a new profile such that speech from thatindividual is thereafter associated with that individual.

In some examples, audio translation engine 207 (e.g., speakeridentification module 232) parses individual voices from a recordingcontaining the speech of multiple individuals, and individuals may beidentified through a variety of means, including by data from auser-specific voice profile, which may include data that can helpidentify the speech acts of one speaker from the sometimescontemporaneous speech acts of other speakers.

Audio translation engine 207 (e.g., speaker identification module 232)may identify a participant speaker based on one or a plurality offactors, including voice pitch height, pitch modulation, pitch range,speech rate, fluency, vocabulary, grammar, usage and other speechpatterns. Additionally, audio translation engine 207 may identify a userby other vocal traits, including measurements of the speakers use ofvowels, including (for example) average and standard deviation forfundamental frequency; period to period frequency; period to periodamplitude variation; and GNE (glottal to noise excitation ratio), asexamples. Other examples include pronunciation of known words, accent,intonation, speech speed, and user-specific word emphasis, or otherphysical, behavioral voice traits. Audio translation engine 207 (e.g.,speaker identification module 232) may also identify a specific speakerby that speaker being pre-identified manually by anyone authorized toaccess 109.

Any other vocal or sound characteristic for a speaker may be utilized bytranscript generation engine 207 (e.g., speaker identification module232) without deviating from the scope of the invention. In oneembodiment, and as an example, a plurality of speakers are identified asparticipating in a deposition or a court hearing. For each such speaker,one or more outlying speech traits are identified for those individuals,and in some preferred embodiments, the speech traits are identifiedbased on how meaningfully they differentiate that speaker from the otherspeakers in the room.

As one example, high pitched voices can be meaningfully and reliablydifferentiated from a lower pitched voice. And, in addition to merespeech acts being identified as speech acts (sounds being identified aswords as opposed to sounds being identified as sounds (e.g. papermoving, chairs shifting, ambient noise, etc.), the words so identifiedmay be further identified as being uttered by a particular individual(in preferred embodiments as a known individual).

In one embodiment, one or more users in advance of a deposition (forexample) will utilize system 200 (e.g., speaker identification module232) to identify themselves by name, and may associate themselves with aknown voice profile (locally or remotely stored; accessible in real timeor accessible post-deposition). In another embodiment, system 200 (e.g.,speaker identification module 232) may utilize microphone(s) 105themselves to identify a speaker participant among participants of thedeposition.

For example, system 200 (e.g., speaker identification module 232) mayassociate one microphone device 105 with each deposition participant,and identify disparate speakers based on which microphone 105 devicerecorded the audio. For example, a specific audio input may beassociated with one distinct individual or with a discrete set ofindividuals. In such an embodiment, a speaker may wear a microphone 105that clips on to clothing (e.g., a shirt collar), or a body part (e.g.,an ear piece), and the system 200 is configured to identify the speechevents detected by that microphone as being the speech events of thespeaker wearing the microphone, as distinct from the speech events ofother speakers, who themselves may be wearing similar, user-specificmicrophones (as recognized by the system). In still other examples,system 200 may associate microphones 105 that are not necessarily wornby participants, for example tabletop or other microphones arranged inproximity to each respective speaker may be used to differentiatebetween the speech of respective deposition participants.

In some cases a voice profile and the resulting translation will enjoyexceptional accuracy due to repeat use of system 200, and the ongoingcapture and analysis of individual-specific and matter-specific (e.g.,case specific) data. Repeat use of the system enables the audiotranslation engine 207 to draw upon a larger body of data (of the kindidentified above), which in turn will yield more accurate transcripts.In addition, audio translation engine 207 may enable post-depositioncorrection(s) via 109A-B of deposition transcripts that have been, forexample, incorrectly translated or incompletely (for any reason) orwhere a portion of the transcript has been pre-flagged by 207 as beingof questionable accuracy, for example due to the use of rare or hard totranslate words. In another embodiment, audio translation engine 207 mayask a user, in advance of a legal proceeding, to read a standardizedtranscript that will be utilized by the translation engine 207 todifferentiate that speaker from other speakers, by gathering voice datathat assists in assigning speech acts to specific speakers in a room(e.g., voice pitch height and modulation, pitch range, speech rate,fluency, vocabulary, grammar, usage and other speech patterns).

In some instances, system 200 may incorporate, or access via networkedmeans, data obtained from discovery and in preferred embodiment, one ormore indexed discovery databases associated with the case at issue inthe deposition. Such databases, including indexed discovery databases,typically include documents and data regarding those documents (e.g.,metadata) that are produced by parties during the course of aproceeding. For example, witnesses in a case or other individuals inpossession of discoverable information relevant to a case often producerelevant documents and things in a variety of forms, including: paperdiscovery, including notebooks, notepads, sketches, and the like andelectronic discovery (i.e., eDiscovery, including information downloadedfrom servers, including email servers, backup tapes, local hard drivesor flash drives). Electronically stored discovery may include documentsthat exist in many different file forms, including files utilized byword processing programs (e.g., doc, docx, dot files), excel files (xls,xlsx), pdf files, tif image files, text files (txt), and photo imagefiles (jpe, jpg, jpeg, etc) among many others. In some instances, thesefiles are gathered from document custodians and stored, andtransformed/processed or analyzed using a variety of methods. Imagefiles and pdf files, for example, may undergo optical characterrecognition (OCR) processing to determine whether they contain text, andconvert the text to an ASCII format. Metadata associated with any filemay be stored in order to identify later who wrote the document andwhen, a when it was edited and to whom it was sent (as examples).Physically produced “hard” documents may be scanned to transform it intoan electronic format which can then undergo further processing (e.g.,OCR processing).

Once the documents and data are converted into a usable and searchablefile format, if it was not already in such a format, then the collectivedata may then be indexed, such that a document reviewer may thenefficiently search substantially all documents produced, processed andstored by a party in order to locate information and facts relevant to alitigation case, without an attorney having to physically read thedocuments. In a case involving asbestos, for example, the indexeddocuments may be searched for key words or the names of key individuals,such that the documents may be readily identified.

In the context of the instant disclosure, system 200 may be linked bynetworked means to a discovery database for a particular case, and thedata there obtained utilized by system 200, among other things, increasethe accuracy of speech to text translation by STT module 234. By way ofexample, system 200 may be utilized to facilitate the deposition of awitness, Mr. Okerlund. System 200 may then query the discovery databaseof documents as a whole to identify the use of infrequently used terms,or in preferred embodiments documents specifically associated with Mr.Okerlund (e.g. associated utilizing metadata identifying emails anddocuments authored by Mr. Okerlund), and those documents may be analyzedby the system to identify language patterns particular to Mr. Okerlund,or the use of unusual or infrequently used words that have been used byMr. Okerlund. STT module 234 may identify such words (in advance, duringor after a deposition) as potential candidate terms for words spoken byMr. Okerlund during his deposition that may be challenging to translate.More broadly speaking, system 200 may query the database as a whole toidentify terms not typically present in everyday speech (and thereforemore difficult to translate), but which may be used more frequently in aspecific industry (e.g., complex pharmaceutical terms used in thecontext of a pharma patent dispute, for example).

Examples include difficult words, terms, names, places, chemical names,or other problematic terms that may come up in association with a case.Where, for example, a document repository contains references touniquely-named places (e.g., Punxsuta Pennsylvania) or difficultbiological, technical, scientific or chemical terms, (e.g.,polysaccharides, immunoglobulin, dodecahedrane and the like) or any term(local idiom, for example) not commonly used in everyday speech, system200 may proactively flag such terms from the indexed document productiondatabase. Audio translation engine 207 (e.g., speech to text module 234)may subsequently utilize these terms to increase the accuracy of thetranslation. In the same vein, system 200 may similarly index the wordcontent of depositions associated with a case, such that uncommon ordifficult words that have come up in the first (or earlier) depositionin a matter may be utilized to increase the accuracy of translationsused in subsequent depositions.

In another embodiment, system 200 may produce a transcript of adeposition that contains links from words in the deposition transcriptto actual documents in an indexed discovery database where those samewords occur. The system 200 may be utilized to produce a completedeposition transcript of Mr. Okerlund that is more accurate and usefullycross-referenced to an indexed database of discovery documents. In oneembodiment, the transcript will be more accurate where Mr. Okerlundreferences the city of Punxsutawney (correctly identified by the system200 as “Punxsutawney” in the converted transcript as opposed to “punksand tawny” due to the fact that the t “Punxsutawney” was among thoseidentified in the indexed discovery database as being an uncommonly usedterm occurring multiple times in associated documents (e.g., viametadata) with Mr. Okerlund). Moreover, utilizing user interface 109, ausei nay click the mouse on uncommon terms in the electronic transcript(or terms identified by a user of the system 200), and the system willquery or otherwise access the indexed discovery database to identifydocuments where that same word or phrase occurred. Thus, a user ofsystemthe may access Mr. Okerlund's deposition transcript, clink on theterm “Punxsutawney” and system 200 may identify specific documents inthe discovery database where this term occurred, and in preferredembodiments may call out in particular those documents specificallyassociated with Mr. Okerlund (e.g., Mr. Okerlund's mails, identified viametadata) where that term occurred. Where system 200 has active accessto such an indexed discovery database during the course of a deposition,system may dynamically search for documents in the discovery database bykey word, and in such a way additional documents may be ntified for useby an attorney utilizing system 200 during a deposition.

As described above, audio translation engine 207 may receive anindication to start a deposition proceeding from a user, and perform aninitialization procedure. In one embodiment, a user may initiate thesystem 200 by launching an application on a smart phone or computer,which may, in preferred embodiments, prompt a participant (often anattorney) to input (or select an existing) case or case caption,participant contact information, email addresses, etc. Audio translationengine 207 may prompt each participant (deponent and attorneys) tointroduce themselves or identify themselves (if they've used the systembefore and have an existing profile). Audio translation engine 207 willthen, utilizing any means (voice, microphone assigned and proximate toor attached to a speaker, etc.) identify each individual so that it canproperty identify individuals and assign speech text to that individual,as opposed to other speakers.

Audio translation engine 207 may then prompt the participants toadminister an oath or otherwise prompt an individual to electronicallyor verbally attest (using, for example, an e-signature or, by givingverbal assent) to a pre-drafted oath. In some embodiments, the system isconfigured to recite an oath using audio output device such as a speakerdevice, and the deponent is prompted to provide their verbal assent,which, along with the oath, is recorded and reflected in the transcript.Signatures may be given using a touch sensitive screen of a userinterface 109, in one embodiment.

As the participants (e.g., attorneys and deponent) speak, the system200, utilizing the apparatus and methods above, will detect speech actsof each speaker, record and translate them, and convert them into text.In a preferred embodiment, this may happen in real time, and can becorrected by a speaker in real time. For example, audio translationengine 207 (e.g., speech to text module 234) may translate speechcaptured by microphone(s) 105 in real time into text identified by user.Such real-time translated text may be displayed to the respective usersvia user interfaces 109. While the deposition is still proceeding,system 200 may provide users with the option to edit text to reflectwhat was said by a user, in the instance of errors.

In instances where multiple individuals speak at the same time, thesystem 200 may alert the parties and caution them about talking over oneanother. In some embodiments, however, it will be possible for thesystem 200 to parse out the disparate, contemporaneous speakers, andproduce a transcript in any manner indicating that two speech acts wereoccurring at the same time or indicating there was overlap.

In one embodiment, and in embodiments where, for example, each speakerhas their own microphone 105 (said microphone which may or may not beassociated by the system with a known or discrete speaker) the system200 will contemporaneously time-stamp or otherwise mark all incomingaudio data from multiple audio sources, such that audio data obtainedfrom one microphone and associated with one known speaker will be markedwith a time stamp (or functional equivalent) at the same time that audiodata from other microphones, which are associated with other speakers,are also timestamped. When the system 200 is fed data streams frommultiple data sources (i.e., from different microphones), the system mayidentify what data was being generated at 3:15:03 PM from microphone 1and ascertain and synchronize with what data (audio data) was beinggenerated at 3:15:03 PM from microphones 2 and 3 and 4 (or others). Thesystem 200 may then utilize those time stamps in order properly orderthe speech events, in any manner desired, in a system-generatedtranscript.

In an alternative embodiment, system 200 may synchronize multiple datasources by analyzing not a common time stamp (or equivalent) but bysynchronizing disparate data files by identifying across them an audioinput that is substantially similar across the files. For example, inthe case of multiple audio files, with different time stamps or lengthsor start and end times, where the system 200 is able to identify a sound(a door closing, a horn), or a noise with a unique or semi-unique dataprofile, and that sound occurs across multiple data files, the system200 will be able to identify that point in both (or across several)recordings (or files), and then work backward and/or forwards tosynchronize the remainder of the files, thus “zippering” those disparatefiles, and the speech events that occurred on them, together. Othermethods of synchronizing multiple audio files may also be utilizedwithout departing from the scope of this disclosure.

Regardless of how it is accomplished (all audio from a deposition, inone embodiment) whether by being captured in a single file, or bycapturing and synchronizing multiple files, acquired across multipleaudio detection devices (e.g., microphones), once these files areobtained, the system 200 may utilize them to create a transcript thataccurately captures and orders speech event into a transcript, which inpreferred embodiments is rendered by attributing speech events to anidentified speaker.

Once a deposition is complete, a participant (often an attorney) willutilize the system 200 to indicate that the deposition has concluded(e.g., via user interface 109). System 200 may forward a rough orcomplete transcript, or a notification that a transcript is availablethrough a user interface, to all authorized parties requesting one(e.g., via e-mail). Where all processing is handled contemporaneouslywith the deposition, and there is an acceptable error rate, a transcriptmay follow immediately upon conclusion of the deposition. In someinstances, additional processing may be required, especially where wordsare difficult to translate (proper names of people or places, foreignwords, highly technical terminology that isn't readily translated).System 200 may present, via user interface 109, a list of terms to eachspeaker to clarify which term was intended. To ensure that noinappropriate or inaccurate post-deposition changes are made to thetranscript, in some embodiments, system 200 preserves an audio recordingof the deposition and a time stamp applied to both the audio recordingand a time stamp to the translation, so there is no doubt of what wassaid if there is a difference of opinion among the participants.

In another embodiment, where the system is unable to identify a wordfrom a data file (due to ambient noise, a plane flying overhead, etc.),or where the identification is tentative (below a pre-set confidencethreshold for the translation), then the system 200 may automaticallyand proactively forward that data file or a portion of that data file tothe speaker or to any other individual associated with that speech act,and that individual may listen to the original audio file and identifywhat it was they said. In another embodiment, where the original speakeris not available (or where otherwise desired) a human non-speakertranslator may listen to the audio file and identify the words used. Insome embodiments, system may pull out of a larger audio file a smalleraudio file or a series of snippets from a deposition and forwarded incompressed or uncompressed and encrypted or unencrypted format to atranslator, who can eliminate errors and verify the accuracy of thetranslation. In some embodiments, overseas translators may be utilized.

In one embodiment, system 200 gives the participants themselves anamount of time to read and sign the transcript. Once signed, system 200sends initialized transcripts to each of the parties and stored locallyor in a cloud environment.

In one embodiment, the system 200 uses finished transcripts to increaseaccuracy of future depositions, especially where participants use thesystem in another deposition involving the same matter, wherein the samespecialized language is utilized.

FIG. 4 is a conceptual diagram illustrating one example of an AutomatedLegal Proceeding Assistant (ALPA) system 400 consistent with one or moreaspects of this disclosure. As shown in FIG. 4, system 400 is arrangedto assist with a deposition with three participants 103A-103C. Accordingto this example, each deponent is associated with a respectivemicrophone 105A-105C. As shown in FIG. 4, digital data representingrecorded audio from the deposition proceeding is communicated over anetwork such as the internet to a speaker identification module 432. Thespeaker identification module 432 comprises software instructions storedin a tangible medium executable by a processor of a computing device,such as user interface(s) local to the deposition proceeding, or one ormore remote server computing devices located remotely from thedeposition proceeding and connected via a network such as the internet.As shown in FIG. 4, speaker identification module 432 includes adifferentiation and association engine that maps recorded audio to oneor more profiles associated with participants to the deposition. In thismanner, the speaker identification module 432 assigns an identity towords and phrases included in the audio recording.

The assignment of an identity to recorded speech may be used, as alsoshown in FIG. 4, by audio translation engine 207 to generate atranscript 113 which reflects what was said by whom in the deposition.

FIG. 5 is a block diagram illustrating one example of an audiotranslation engine 207 consistent with one or more aspects of thisdisclosure. As depicted in FIG. 5, audio translation engine 507 isconfigured to receive a digital representation of an audio recordingthat includes speech captured by microphone(s) 105 as part of adeposition proceeding. As shown in FIG. 5, audio translation engine 207performs a spectral analysis on the audio recording. As also shown inFIG. 5, audio translation engine 507 estimates a probability that theperformed spectral analysis is correct. As also shown in FIG. 5, audiotranslation engine 507 performs analysis on the audio data, to compareit to verbal models, user specific profiles, and grammar models. As alsoshown in FIG. 5, based on the comparison, audio translation engine 507identifies words in the audio data. As also shown, audio translationengine 107 builds a transcript based on the identified words. This isbut one example of the class of audio translation engines that may beemployed. Any system known in the art or hereinafter developed may beemployed without departing from the scope of the invention.

FIG. 6 is a conceptual diagram that illustrates one example of data thatmay be stored at a server computing device of an ALPA system 200consistent with one or more aspects of this disclosure. As shown in FIG.6, server 602 is coupled to a network 601, such as the internet. Asshown in FIG. 6, server 602 is coupled to or contains one or morestorage devices 603, for example temporary memory such as random-accessmemory, or long-term storage such as a magnetic hard disc, flash memory,or the like.

Server 602 is configured to store user-specific data 604. As shown inFIG. 6, the user-specific data 604 may include user-specific voicerecognition data 611, user-specific specialized vocabulary data 612,matter specific access data for a user 613, matter specific data 614,and user-associated deposition records 615. User-specific voicerecognition data 611 may include one or more user speech profilesincluding speech parameters and characteristics that speakeridentification module 232 uses to identify a speaker associated with arecorded audio segment. User specialized vocabulary data 612 may includedata indicating specific vocabulary used by a particular depositionparticipant user, which may be used by speaker identification module232, speech to text module 234, or both. Matter specific data 614 mayinclude data specific to a particular court or law firm matterassociated with a particular deposition or plurality of depositionproceedings. By way of example, said matter specific data may includedata obtained from discovery documents associated with a specific matter(i.e., a specific litigation case), such as unusual terminology or namesthat occur in produced documents). User-associated deposition records615 may include information associated with a particular user, which mayinclude information from multiple deposition proceedings across multiplecases or matters that involved a particular user.

FIG. 7 is a flow diagram illustrating one example of a method ofautomatically generating a legal proceeding transcript according to oneor more aspects of this disclosure. At 701, the method includesrecording, using a plurality of microphones each associated with adeposition participant of a plurality of deposition participants, thecontent of a deposition. The content of the deposition includes aplurality of speech segments recorded by the plurality of microphones.At 702, the method includes identifying, based on which microphone ofthe plurality of microphones each speech segment was recorded by, whichdeposition participant of the plurality of deposition participants isassociated with each speech segment. In other examples not depicted inFIG. 7, the method may include identifying which deposition of theplurality of deposition participants is associated with each speechsegment based on processing the recorded audio segments to comparespeech properties to a predetermined profile representing the respectivedeposition participants. The method may further includes converting thespeech content of each recorded speech segment into written text. At703, the method includes generating, based on which depositionparticipant of the plurality of deposition participants is identified asassociated with each speech segment, a document comprising a transcriptof the deposition, wherein the transcript comprises written textidentifying sequentially what content was spoken and which depositionparticipant of the plurality of deposition participants spoke thecontent.

FIG. 8 is a block diagram depicting generally a computing environment inwhich the ALPA system 200 described herein may operate. As shown in FIG.8, the computing environment includes both a local computing device 810and a network computing device 820. Local computing device 810 is adevice located close to a legal proceeding such as a deposition, and maycomprise a desktop, laptop, smartphone, or tablet computing device.Local computing device 810 may serve as a user interface 209, whichallows one or more users of ALPA system 200 to interact with system 200,for example to receive messages, or to input instructions orinformation, either before or during or after a deposition. For example,as shown in FIG. 8, local computing device includes a display 801 and aninput interface 802. In the case where local computing device 810comprises a laptop or desktop computer, input interface 802 may be akeyboard, mouse, trackpad, or the like. In cases where local computingdevice 810 is a smartphone or tablet computing device, input interface802 may include a touchscreen display of the device configured toreceive user input via touch.

As also shown in FIG. 8, local computing device 810 includes a processor803, short-term memory 804, and long term storage 805. Processor 803comprises any computing device, such as a central processing unit (CPU),graphics processing unit (GPU), Application Specific Integrated Circuit(ASIC), field programmable gate array (FPGA) or the like capable ofexecuting instructions to cause local computing device 820 to operate inan intended manner. Long term storage 805 may comprise a tangiblecomputer-readable medium configured to store data and programinstructions capable of execution by processor 803. For example,long-term storage 805 may include one or more tangible media, such as amagnetic hard drive or flash memory hard drive. Short term storage 804,which is also considered tangible media, is configured to temporarilystore instructions and/or data for execution by processor 803.

In operation, program instructions stored in long-term storage 805 maybe loaded into short term memory 804, and executed via processor 803.

As shown in FIG. 8, the computing environment further includes remotecomputing device 820, which like local computing device 810, includes aprocessor 903, short term memory 904, and long-term memory 905. Each ofthese components operates similarly to their counterparts in localcomputing device 810, with long term storage 905 storing programinstructions and/or data, which may be loaded onto short-term storage904 for execution by processor 903. Remote computing device 820 may becommunicatively coupled to local computing device 810 via a network,such as the internet.

One of skill in the art will readily understand that any portion of theALPA system 200 described herein may comprise program instructionsexecutable by a processor of either local computing device 810(processor 803) or remote computing device 820 (processor 903). Forexample, any components of audio processing engine 207, including audiostorage module 230, speaker identification module 232, speech-to-textmodule 234, and transcript generator 240 may comprise programinstructions stored in respective tangible media (804, 904) and executedsolely by local computing device 810 or remote computing device 820, orin combination between local computing device 810 and remote computingdevice 820 without departing from the scope of this disclosure.Furthermore, data used by system 200 to automatically generate legalproceeding transcripts may operate on data stored at local computingdevice 810, remote computing device 820, or both. For example, thevarious data depicted in FIG. 6, including user profiles enabling theidentification of the source of recorded speech, may be stored in localcomputing device 810, remote computing device 820, or any combination oflocal computing device 810 and remote computing device 820.

As one specific example, during a deposition proceeding, eachparticipant to the deposition proceeding may have access to a localcomputing device 810 (user interface 109) that includes instructionsstored in short-term memory 804 or long-term memory 805 to cause asoftware application to execute on processor 803. The softwareapplication may serve as an interface for the respective depositionparticipants to interact with system 200. The software application may,for example, provide users with selectable prompts such as to initializea deposition proceeding, to submit oaths, to assign microphones 105 todeposition participants, to commence a deposition proceeding, or toconclude the deposition proceeding.

According to this example, local computing device(s) 810 may be coupledto one or more microphone(s) 105, which may be either included in therespective local computing device(s) 810, or communicatively coupled tothe respective local computing device(s). The software application mayreceive one or more digital representations of recorded audio data asone or more audio segments. The software application may send therecorded audio to data to remote computing device 820 via network 806.According to this example, audio storage module 230 may execute onprocessor 803 of local computing device 810 to prepare and send theaudio data to remote computing device 820. For example, audio storagemodule 230 executing on local computing device 810 may encode audio datato reduce a transmission size of the audio data. As another example,audio storage module 230 executing on local computing device 810 mayencrypt received audio data to improve a security of transmission of theaudio data.

At least a portion of audio storage module 230 may include softwareinstructions stored in a tangible medium (short-term memory 904,long-term storage 905) of remote computing device 820, and may beoperable to receive transmitted audio data and store it (e.g., inshort-term memory 904, long-term storage 905) for processing.

According to this example, speaker identification module 232 andspeech-to-text module 234 may include executable program instructionsstored in a tangible medium (short-term memory 904, long-term storage905) and executable on a processor 903 of remote computing device 820that cause remote computing device 820 to associate respectivedeposition participants with speech contained in the stored audiorecordings, and speech-to-text module 234 may process the stored audioto convert recorded speech into representative text. According to thisexample, transcript generator 240 also includes program instructionsstored in a tangible medium (short-term memory 904, long-term storage905) and executable on a processor 903 of remote computing device 820that cause remote computing device 820 to generate a document comprisinga transcript that represents sequentially what was said during thedeposition proceeding, and who said it.

In an example, once an initial transcript is generated, transcriptgenerator 240 executing on remote device 820 sends the generatedtranscript document, or a message alerting them to its availability, toone or more deposition participants via network 806. For example, remotedevice 820 may send the generated transcript, or notice of itsavailability, to the respective participants through the previouslydescribed software application executing on local computing device 810.As previously described, the generated transcript may includeidentifications of one or more ambiguities in the transcript that couldnot be resolved with a high probability of accuracy. In some examples,the software application may give the deposition participants atime-window in which to respond to accept, reject, or provide feedbackwith respect to the generate transcript, including identifiedambiguitie(s). In some examples, once all deposition participants haveresponded to either clarify all identified ambiguities or accept theinitial transcript, the software application executing on localcomputing device 810 may send an indication to generate a finaltranscript to the remote computing device 820. Remote computing device820 may generate the final transcript, including resolving identifiedambiguities based on deposition participant feedback received throughthe software application, and generate a final deposition transcript.The final deposition transcript may be sent to the participants vianetwork 806 through the software application executing on the localcomputing device 810.

What is claimed is:
 1. A method comprising: receiving an output signalfrom one or more microphones, the output signal representing contentfrom a deposition proceeding having two or more participants; storingthe received output signal from the one or more microphones in memory,wherein the received output signal is stored as a plurality of audiofiles representing the entire deposition proceeding; communicating eachof the plurality of audio files to a remote server to generate adocument comprising a transcript of the deposition, wherein each of theplurality of audio files includes a timestamp that identifies a timeassociated with each of the plurality of audio files; generating adocument comprising a transcript of the deposition based on theplurality of audio files, wherein the timestamps associated with each ofthe plurality of audio files are utilized to correctly order thetranscript and wherein the identified deposition participation isincluded in the transcript; and communicating the document comprisingthe transcript o one or more participants of the deposition.
 2. Themethod of claim 1, wherein generating the document comprising thetranscript includes identifying one or more unclear portions in thetranscript.
 3. The method of claim 2, wherein the document comprisingthe transcript prompts participants of the deposition to accept, reject,and/or provide feedback regarding the one or more unclear portions. 4.The method of claim 1, further including compressing each of theplurality of audio files to reduce a size of each audio file prior tocommunicating to the remote server.
 5. The method of claim 1, furthercluding encrypting each of the plurality of audio files.
 6. The methodof claim 1, wherein generating the document comprising the transcript ofthe deposition includes identifying a deposition participant speakingduring the deposition proceeding and including the identified speaker aspart of the transcript.
 7. The method of claim 6, wherein eachdeposition participant is associated with one of the plurality ofmicrophones, and wherein each of the plurality of audio filescommunicated to the remote server includes an identification of themicrophone utilized to record the audio file.
 8. The method of claim 7,wherein the deposition participant is identified based at least in parton the identification of the microphone utilized to record the audiofile.
 9. The method of claim 6, wherein the remote server stores a userprofile associated with each deposition participant, wherein depositionparticipants are identified based at least in part on the stored userprofiles.
 10. The method of claim 1, further including: displaying thedocument comprising the transcript to the deposition participants viaone or more displays.
 11. The method of claim 10, further including:receiving feedback via the one or more displays to accept, reject and/ormodify portions of the transcript.
 12. The method of claim 1, furtherincluding: detecting, via a first microphone, that a first participantis speaking; detecting, via a second microphone, that a secondparticipant is speaking at the same time as the first participant; andoutputting, via a display visible to the first participant and thesecond participant, a warning that multiple participants are speakingcontemporaneously.
 13. The method of claim 12, further including:outputting, via the display visible to the first participant and thesecond participant, a warning that the deposition is paused until thefirst participant and the second participant indicate they wish tocontinue; and pausing until an indication is received from the firstparticipant and second participant that they wish to continue with thedeposition.